nLab
regular language
Contents
Context
( 0 , 1 ) (0,1) -Category theory
Linguistics
Linguistics
General
Philosophy of Language
Pragmatics?
Concepts
People
Contents
Idea
Regular languages are a simple kind of formal languages , the least expressive in the Chomsky hierarchy .
Definition
Fix a finite set V V (serving as the alphabet ) and denote its free monoid by V ⋆ V^\star . A language L ⊆ V ⋆ L \subseteq V^\star is regular whenever there is a formal grammar G = ( V , X , R , s ) G = (V, X, R, s) that generates it (i.e. L ( G ) = L L(G) = L ) such that all rules in R R are of the form x → w x ′ x \to w x' for a terminal w ∈ V w \in V and non-terminal symbols x , x ′ ∈ X x, x' \in X .
Properties
Regular languages can be characterised by regular expressions . Regular expressions are inductively generated by the following rules:
For each terminal w ∈ V w \in V , w w
Basic expressions ∅ \emptyset , ϵ \epsilon
For each pair of regular expressions R 1 , R 2 R_1,R_2 , R 1 | R 2 R_1|R_2 and R 1 ⋅ R 2 R_1 \cdot R_2
For each regular expression R R the Kleene star R ⋆ R^\star .
These can be given a semantics in the poset of languages P ( V ) P(V) by universal properties .
L ( ∅ ) = ∅ L(\emptyset) = \emptyset and L ( R 1 | R 2 ) = L ( R 1 ) ∪ L ( R 2 ) L(R_1 | R_2) = L(R_1) \cup L(R_2) . These are nullary and binary joins.
L ( w ) = { w } L(w) = \{ w \} . These are representable viewing languages as de-categorified presheaves .
L ( ϵ ) L(\epsilon) consists of only the empty string, L ( R 1 ⋅ R 2 ) = { α ∈ V ⋆ | ∃ β ∈ L ( R 1 ) , γ ∈ L ( R 2 ) ⋅ α = β γ } L(R_1\cdot R_2) = \{\alpha \in V^\star \vert \exists \beta \in L(R_1), \gamma \in L(R_2) \cdot \alpha = \beta \gamma \} . This is a de-categorified version of the Day convolution monoidal structure.
L ( R ⋆ ) = { α ∈ V ⋆ | ∃ n ∈ ℕ . ∃ β 0 , … , β n − 1 ∈ L ( R ) . α = β 0 ⋯ β n − 1 } L(R^\star) = \{ \alpha \in V^\star \vert \exists n \in \mathbb{N}. \exists\beta_0,\ldots,\beta_{n-1} \in L(R). \alpha = \beta_0\cdots\beta_{n-1} \} . In category theoretic terms this is the least solution to the recurrence equation L ( R ⋆ ) = L ( ϵ | ( R ⋅ R ⋆ ) ) L(R^\star) = L(\epsilon| (R\cdot R^\star)) , and can be constructed by Kleene's fixed point theorem as an ω \omega -colimit .
Every regular language can is denoted by some regular expression in this way.
Equivalently, a language is regular if it can be recognised by a finite-state automaton .
Last revised on March 16, 2023 at 07:41:11.
See the history of this page for a list of all contributions to it.